Next-generation ETL Framework to Address the Challenges Posed by Big Data

نویسنده

  • Syed Muhammad Fawad Ali
چکیده

The specific features of Big Data i.e., variety, volume, and velocity call for special measures to create ETL data pipelines and data warehouses. A rapidly growing need for analyzing Big Data calls for novel architectures for warehousing the data, such as data lakes or polystores. In both of the architectures, ETL processes serve similar purposes as in traditional data warehouse architectures. Except the fact that the data to process has multitude of formats and the relationships between data are often very complex. Furthermore, most of the times data transformations are required on-the-fly that have to be executed and completed in near real-time. For these reasons designing and optimizing ETL workflows for Big Data is much more difficult than for traditional data. In this paper, we focus on the ETL aspect of Big Data and propose an extendable ETL workflow that addresses the aforementioned challenges posed by Big Data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modern Data Formats for Big Bioinformatics Data Analytics

Next Generation Sequencing (NGS) technology has resulted in massive amounts of proteomics and genomics data. This data is of no use if it is not properly analyzed. ETL (Extraction, Transformation, Loading) is an important step in designing data analytics applications. ETL requires proper understanding of features of data. Data format plays a key role in understanding of data, representation of ...

متن کامل

Formal Methods for Preserving Privacy for Big Data Extraction Software

Given the inexpensive nature and increasing availability of information storage media, businesses, government agencies, healthcare professionals, and individuals worldwide have exponentially increased their production and persistence of large amounts of data whether such data are captured as text, images, or sound. As such, it is no surprise that just coining the term “Big Data” has generated a...

متن کامل

Big Data Generation

Big data challenges are end-to-end problems. When handling big data it usually has to be preprocessed, moved, loaded, processed, and stored many times. This has led to the creation of big data pipelines. Current benchmarks related to big data only focus on isolated aspects of this pipeline, usually the processing, storage and loading aspects. To this date, there has not been any benchmark prese...

متن کامل

Extending ETL framework using service oriented architecture

Extraction, Transformation and Loading (ETL) represent a big portion of a data warehouse project. Complexity of components extensibility is a main problem in the ETL area, because ETL components are tightly-coupled to each others in the current ETL framework. The missing extensibility feature causes impediments to add new components to the current ETL framework; to meet special business needs. ...

متن کامل

Interoperable Distributed Data Warehouse Components

Extraction, Transformation and Loading (ETL) are the major functionalities in data warehouse (DW) solutions. Lack of component distribution and interoperability is a gap that leads to many problems in the ETL domain, because these ETL components are tightly-coupled in the current ETL framework. Furthermore, complexity of components extensibility is another gap in the ETL area, because of the sa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018